89 research outputs found
The mythical thermohaline oscillator?
The system discussed by Stommel (1987) and Welander (1982), in which heating and evaporation at the surface of the ocean are balanced by vertical turbulent mixing, is studied analytically and numerically for mixing laws appropriate to salt fingers, rather than mechanical turbulence. Stommel and Welander found for mechanically-driven turbulent mixing that a limit cycle of T and S exists (that is, T and S oscillate) in the presence of steady forcing. We find that the usual salt finger parameterizations, in which salinity flux coefficient and buoyancy flux ratio decrease with increasing density ratio, do not allow a limit cycle. This result holds whether the flux parameterization is for an interface using the “4/3 power law” laboratory relationships or in terms of vertical gradients. Rather, all initial conditions either evolve to a steady balance or lead to the upper layer becoming denser than the lower layer and overturning. In addition, we find that commonly used mechanical turbulence parameterizations for eddy diffusivity vs. Richardson number do not vary rapidly enough to allow a limit cycle in the Stommel/Welander model, although recent observations of equatorial turbulence do. Hence the possible existence of a limit oscillation in evaporatively-driven areas of the ocean depends critically on the type of vertical mixing which occurs, and on the precise form of its parameterization
Persistent Kernels for Iterative Memory-bound GPU Applications
Iterative memory-bound solvers commonly occur in HPC codes. Typical GPU
implementations have a loop on the host side that invokes the GPU kernel as
much as time/algorithm steps there are. The termination of each kernel
implicitly acts as the barrier required after advancing the solution every time
step. We propose a scheme for running memory-bound iterative GPU kernels:
PERsistent KernelS (PERKS). In this scheme the time loop is moved inside a
persistent kernel, and device-wide barriers are used for synchronization. We
then reduce the traffic to device memory by caching a subset of the output in
each time step in registers and shared memory to be used as input for the
following time step. PERKS can be generalized to any iterative solver: they are
largely independent of the solver's implementation. We explain the design
principle of PERKS and demonstrate the effectiveness of PERKS for a wide range
of iterative 2D/3D stencil benchmarks (geometric mean speedup of x in
small domains and x in large domains), and a Krylov subspace solver
(geometric mean speedup of x in smaller SpMV datasets from SuiteSparse
and x in larger SpMV datasets, for conjugate gradient)
Exploiting Scratchpad Memory for Deep Temporal Blocking: A case study for 2D Jacobian 5-point iterative stencil kernel (j2d5pt)
General Purpose Graphics Processing Units (GPGPU) are used in most of the top
systems in HPC. The total capacity of scratchpad memory has increased by more
than 40 times in the last decade. However, existing optimizations for stencil
computations using temporal blocking have not aggressively exploited the large
capacity of scratchpad memory. This work uses the 2D Jacobian 5-point iterative
stencil as a case study to investigate the use of large scratchpad memory.
Unlike existing research that tiles the domain in a thread block fashion, we
tile the domain so that each tile is large enough to utilize all available
scratchpad memory on the GPU. Consequently, we process several time steps
inside a single tile before offloading the result back to global memory. Our
evaluation shows that our performance is comparable to state-of-the-art
implementations, yet our implementation is much simpler and does not require
auto-generation of code.Comment: This is short paper is published in the 15th workshop on general
purpose processing using GPU (GPGPU 2023
Tunable violet radiation in a quasi-phase-matched periodically poled stoichiometric lithium tantalate waveguide by direct femtosecond laser writing
[EN]We report on violet-light generation using the femtosecond-laser written waveguides in periodically poled MgO:LiTaO3 crystal under conditions of third-order quasi-phase matching. Ten parallel depressed cladding waveguides are successfully fabricated with different grating periods in the same sample with fan-out χ(2) grating structures. These waveguides exhibit high optical quality with minimum insertion loss as low as 0.71 dB. Temperature and wavelength tuned second harmonic generation for different waveguides are demonstrated by using a tunable CW Ti sappire laser. Tunable violet second harmonic light has been generated with a single period over the range of 396 nm to 401 nm by varying the crystal temperature from 60 °C to 200 °C. At the quasi-phase matching temperature, 0.37 mW of violet light power at 397.2 nm is generated for a fundamental power of 336.7 mW, corresponding to a normalized conversion efficiency of 0.39%/(W·cm2). Our work contributes to designing tunable and efficient on-chip violet light sources based on femtosecond-laser written waveguides.This work was supported by the National Natural Science Foundation of China (Nos. 11874239 and 61775120); Major Program of Shandong Province Natural Science Foundation (Grant No. ZR2018ZB0649); National Key Research and Development Project (No. SQ2019YFA070063-01); MINECO (FIS2017-87970-R); and Ministerio de Economía y Competitividad de España (MAT2016-75362-C3-1-R)
Second harmonic generation of femtosecond laser written depressed cladding waveguides in periodically poled MgO:LiTaO3 crystal
We report on the fabrication of depressed cladding waveguides in periodically poled MgO doped LiTaO3 by using low-repetition-rate femtosecond laser writing, and their use for guided-wave second harmonic generation (SHG). The cladding waveguides exhibit different guiding performance along the extraordinary and ordinary polarizations. The temperature-dependent quasi-phase-matching (QPM) is realized to obtain SHG in the depressed cladding waveguides. The results show that the QPM temperature was dependent on the poling period and on the features of the cladding waveguides. The highest nonlinear conversion efficiency (0.74%W−1cm−2) was found in the waveguide fabricated with large scanning velocity (0.75 mm/s) and small radius (15 μm).National Natural Science Foundation of China (NSFC) (61775120, 11874239); Junta de Castilla y León (Project SA046U16); Spanish Ministerio de Economía y Competitividad (MINECO, FIS2013-44174-P, FIS2015-71933-REDT)
- …